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Abstract 

The rank problem in succinct data structures asks to preprocess an array A[l . . n] of bits into 
a data structure using as close to n bits as possible, and answer queries of the form Rank(/c) = 
y^., A[i]. The problem has been intensely studied, and features as a subroutine in a majority 
of succinct data structures. 

We show that in the cell probe model with w-bit cells, if rank takes t time, the space of the 
data structure must be at least n + n/w°^ bits. This redundancy /query trade-off is essentially 
optimal, matching our upper bound from [FOCS'08]. 



1 Introduction 

1.1 The Complexity of Rank 

Consider an array A[l . . n] of bits. Can we preprocess this array into a data structure of size n + r 
bits, for small redundancy r, which supports rank queries RANK(fc) = YjZ=x A[i] efficiently? The 
problem of supporting rank (and the related select queries) is the bread-and-butter of succinct data 
structures. It finds use in most other data structures (for representing trees, graphs, suffix trees / 
suffix arrays etc), and its redundancy / query trade-off has come under quite a bit of attention. 

Rank already had a central position in the seminal papers on succinct data structures. Jacobson 
[Jac89| . in FOCS'89, and Clark and Munro [CM96j . in SODA'96, gave the first data structures using 
space n + o(n) and constant query time. These results were slightly improved in MuniKi, MRR01, 
IRR.R.02] . 

In several applications, the set of ones is not dense in the array. Thus, the problem was 
generalized to storing an array A{\ . .u], containing n ones and u — n zeros. The optimal space 
is B = lg Q). Pagh |Pag01| achieved space B + 0(n ■ ^jf^ ) for this sparse problem. Recently, 
Golynski et al. [GGG+07] achieved B + 0(n ■ W^). Subsequently, Golynski et al. [GRR08] have 



achieved space B + 0(n ■ lg lg " g ^" ) ■ 

In my paper from FOCS'08 [Pat08], I gave a qualitative improvement to these bounds, showing 
an exponential dependence between the query time and the redundancy. Specifically, with query 
time 0(t), the achievable redundancy is r < n/(-^p) 4 . This improved the redundancy for many 
succinct data structures where rank/select queries were the bottleneck. 

Given the surprising nature of this improvement, a natural question is whether we can do much 
better. In this paper, we show that we cannot, at least for the basic rank queries: 



Theorem 1. In the cell-probe model with words of w > lgn bits, a data structure that supports 
rank queries in t cell probes requires at leastn + n/w°^ bits of space. 

All succinct data structure papers assume w = lgn. The lower bound matches my upper 
bound, except for the difference between (lgn)' and (^p)*. This difference is inconsequential 
for small t < lg n. If we want a polynomially small redundancy (say, less than n a , for some 
constant a < 1), the upper bound says that t = O(lgn) is sufficient. The lower bound says that 
t = 0(lgn/lglgn) is necessary. It is unclear which bound is the optimal one in this regime. 

1.2 Lower Bounds for Succinct Data Structures 

Much work in lower bounds for succinct data structures has been in the so-called systematic model. 
In this model, the array A must be represented as is, i.e. the data structure only has oracle 
access to it (it can read any w consecutive bits at O(l) cost). In addition, the data structure 
may store an index of sublinear size, which the query algorithm can examine at no cost. See 
[GM03, Mil05, GRR08, Gol07j for increasingly tight lower bounds in this model. Note, however, 
that in the systematic model, the best achievable redundancy with query time t is t . pol y lg n , i-e. there 
is a linear trade-off between redundancy and query time. This is significantly improved by my 
(non-systematic) upper bounds [Pat08], and these lower bounds qualitatively miss the nature of 
this improvement. 

In the unrestricted cell-probe model, the first lower bounds were shown by Gal and Mil- 
tersen [GM03] in 2003. These lower bounds were strong, showing a linear dependence between 
the query and redundancy r • t = fi(n/lgn). However, the problem being analyzed is somewhat 
unnatural: the bound applies to polynomial evaluation, for which nontrivial succinct upper bounds 
appear unlikely. Their technique, which is based on the strong error correction implicit in their 
problem, remains powerless for "easier" problems. (Thus, succinct data structures are unusual 
for lower bounds, in that the difficult goal seems to be proving lower lower bounds for natural 
problems.) 

A significant break-through occured in SODA'09, when Golynski [Gol09| showed a lower bound 
of r • t 2 = Q(n) for the problem of storing a permutation and querying ir(-) and 7r -1 (-). This 
quadratic trade-off is tight for storing a permutation and its inverse. Golynski's technique is based 
on the inherent difficulty of storing a function and its inverse without doubling the space. However, 
due to the particular attention it pays to inverses, it is unclear how it could generalize to problems 
like rank. 

In this paper, we make further progress on getting lower bounds for natural problems, and 
analyze one of the central problems in succinct data structures. It is reasonable to hope that our 
lower bound technique will generalize to many other problems, given the many applications of rank 
queries. 

2 The Proof 

2.1 An Entropy Bound 

The structure of the rank problem is not particularly important in the lower bound proof. All that 
is needed is an inequality on the entropy of rank queries that we describe here. Essentially, the 
lower bound applies to any problem which satisfies a similar entropy condition. 
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The possible queries come from the universe [n]. Imagine that this universe is divided into k 
blocks of equal size (the remainder is ignored if k doesn't divide n). Let Qa C [n] be the set 
containing the A-th query (counting from zero) in each block. For a set Q of queries, let Ans(Q) 
be the vector of answers to the queries in Q. We treat Ans(Q) as a random variable, depending on 
the random choice of the input A[l . . n]. 

Lemma 2. Let A is chosen uniformly at random in {0, l} n , and letA and any Q* C be 
arbitrary. Then, for any event £ with Pr[£] = 2~ £ \®*\ for a small enough constant e, we have: 

if(Ans(Q ) I S) + #(Ans(Q*) | £) - tf(Ans(Q ), Ans(Q*) | £) = 0(|Q*|) 

Proof. Let us ignore the conditioning on £ for now. The lemma says that representing the answers 
to the queries Qo and (a subset of) Qa separately loses 0(1) bits of entropy per block compared 
to the optimal joint encoding. 

Let h m be entropy of the binomial distribution on m unbiased trials. The entropy H(Ans(Qo)) 
is exactly equal to k ■ h n /^: the answer of a query minus the answer of the previous is exactly a 
binomial on n/k random bits. In all blocks that do not contain an element of Q*, the contribution 
of the block in H(Ans(Qo)) is cancelled by its contribution in H(Ans(Qo), Ans(Q*)). 

Blocks that contain an element from Q* (except the first block) contribute: 

• K/k to F(Ans(Q )); 

• at least h n i^ to H(Ans(Q*)). The contribution is more if the previous block did not contain 
an element from Q*; 

• exactly fiA + ^n/fc-A to H(Ans(Qo), Ans(Q*)). 

Thus, the block contributes 2/i n /^ — h& — /i n /fc_A to the sum. Using the known estimation 
h m = jln(ym) + 0(^), this quantity is minimized when A = and is always at least In 2 — o(l). 

The fact that conditioning on E does not change the result comes from a standard indepen- 
dence trick in lower bounds. We decomposed H(Ans(Qo)) + H(Ans(Q*) — H(Ans(Qo), Ans(Q*)) 
as the sum over Q* independent variables (essentialljo ). Each component was 0(1) with constant 
probability. By a Chernoff bound, the sum is 0(|Q*|) with probability 2~ "CIQ*D. Thus, even if 
we condition on an event of probability 2 £ '^ I, the sum must remain 0(|Q*|) with overwhelming 
probability. □ 

2.2 Cell-Probe Elimination 

To support the induction in our proof, we augment the cell-probe model with published bits. These 
bits represent a memory of bounded size which the query algorithm can examine at no cost. Like 
the regular memory (which must be examined through cell probes) , the published bits are initialized 
at construction time, as a function of the input A[l . . n]. Observe that if we have n published bits, 
the problem can be solved trivially. 

Our proof will try to publish a small number of cells from the regular memory which are accessed 
frequently. Thus, the complexity of many queries will decrease by at least one. The argument is 

1 The careful reader has probably noticed that we actually decomposed it into two sums, each of which has Q* 
terms independent among themselves; however, the sums are dependent. We are subtracting the entropy of sub-blocks 
of size A from the entropy of blocks of size n/k in the first sum; and the entropy of sub-blocks of size n/k — A from 
the entropy of blocks of size n/k in the second sum. The analysis proceeds by union bound over the two sums. 



3 



then applied iteratively: the cell-probe complexity decreases, as more and more bits are published. 
If we arrive at zero cell probes and less than n published bits, we have a contradiction. 

Let Probes(g) be the set of cells probed by query q; this is a random variable, since the query 
can be adaptive. Also let Probes(Q) = UgeQ Probes^). 

The main technical result in our proof is captured in the following lemma, the proof of which 
appears in the next section: 

Lemma 3. Assume a data structure uses P = o(n) published bits, and at most n memory bits. 
Break the queries into k = 7 ■ P blocks, for a large enough constant 7. Then: 

Pr [Probes(g) n Probes(Q ) ^ 01 = 0(1) 

A,q€[n] 

The lemma shows that Probes(<5o) are a good set of cells to publish, since a constant fraction 
of the queries probe at least one cell from this set. 

Completing the proof is now easy. If the data structure has redundancy r, begin by publishing 
some arbitrary Pq = r bits, to satisfy the condition that there are at most n bits in regular memory. 

In step i = 0, 1,2. . . , we let h L = 7 • Pj, and publish the cells in Probes(Qo)> together with 
their address. The number of published bits increases to Pj+i = hi ■ (w + O(lgn)) = O(Piw). The 
cell-probe complexity of an average query decreases by 0(1). 

Since the average case complexity cannot go below zero, the number of iterations that we are 
able to make must be 0(t). The only reason we may fail to make another iteration is a violation 
to the lemma's condition P = o(n). Thus, Po(t) = 0(n), that is r • > n. This is the desired 

trade-off. 

2.3 An Encoding Argument 

In this section, we prove Lemma [3l Our proof is an encoding argument: we show that, if the 
conclusion of the lemma failed, we could encode a uniformly random A using strictly less than n 
bits. 

Let P and k be as in our lemma's statement, and assume Pr^ gg [ n ] [Probes(g) n Probes(Qo) 7^ 
0] < e, for a small enough constant e. We thus know that a random query is very likely to probe 
cells not in Probes(Qo)- 

By averaging, there exists a A € {1, ... ,n/k} such that Pr J 4 g gQ A [Probes(g) n Probes(Qo) 7^ 
0] < e. We are only going to concentrate on the queries in Qa- 

More specifically, we are going to concentrate on the queries that probe no cell from Probes(Qo) : 
Q* = { q G Q A I Probes(g) n Probes(Q ) = 0}- Note that E A [|Q*|] > (1 - e)k. 

Intuitively speaking, our contradiction is found as follows. The answers to queries Qq must be 
encoded in the cells Probes(Qo)- The answers to queries Q* must be encoded in the cells Probes(Q*), 
which, by definition, is disjoint from Probes(<5o)- But the answers Ans(Qo) an d Ans(Q*) are highly 
correlated (by Lemma [2|). Thus, if the two answers are written in disjoint sets of cells, a lot of 
entropy is being wasted, which is impossible for a succinct data structure. 

The footprint. We first formalize the intuitive notion of "the contents of cells Probes(<5)-" Define 
the footprint Foot(Q) of a query set Q by the following algorithm. We assume the published bits 
are known in the course of the definition. Enumerate queries q € Q in increasing order. For each 
query, simmulate its execution one cell probe at a time. If a cell has already been included in the 
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footprint, ignore it. Otherwise, append the contents (but not the address) of the new cell in the 
footprint. Observe that Foot(Q) is a string of exactly |Probes(<5)| • w bits. 

We observe that Ans(Q) is a function of Foot(Q) and the published bits. Indeed, we can 
simmulate the queries in order. At each step, we know how the query algorithm acts based on 
the published bits and the previously read cells. Thus, we know the address of the next cell to be 
read. We can check whether the cell was already in the footprint (since we also know the address 
of previous cells). If not, we read the next w bits of the footprint, which are precisely the contents 
of this cell, and continue the simulation. 

The encoding. Our encoding for the array A will consist of the following: 

1. the published bits (P bits). Denote these bits by the random variable V. 

2. the identity of the set Q* as a subset of Qa- This uses 0(lg (iq*|)) = 0(lg ( fc _j C g*|)) bits. By 
submodularity, the average length of this component is on the order of: 

e K*_Vi)] s *U-Vu) s O = 

3. the answers Ans(Qo U Q*), encoded jointly. Using Huffman coding, this requires H (Ans(Qo U 
Q*)) + 0(1) bits on average. 

4. the footprint Foot(Qo)) encoded optimally given the knowledge of Ans(Qo) and the published 
bits. This takes H(Foot(Q ) \ Ans(Qo),V) + O(l) bits on average. 

5. the footprint Foot(Q*), encoded optimally given the knowledge of Q*, Ans(Q*), and the pub- 
lished bits. This takes H(Foot(Q*) \ Q*, Ans(Q*), V) +0(1) bits on average. 

6. all cells outside Probes(<5o) U Probes(Q*), included verbatim with w bits per cell. As noted 
above, the cell addresses Probes(Qo) and Probes(Q*) can be decoded from Foot(Qo)) respec- 
tively Foot(Q*), and the published bits. Thus, we know exactly which cells to include in this 
component. This part takes n — E[|Probes(Qo)| + |Probes(<5*)|] • w bits on average. 

Observe that this encoding includes the published bits and all cells in the memory (though the 
cells in Probes(Qo) and Probes(Q*) are included in a compressed format). Thus, all n queries can 
be simmulated. If all n answers are known, the array A can be decoded. Thus, this is a valid 
encoding of A. 

It remains to analyze the average size of the encoding. To bound item 4., we can write: 

#(Foot(Q ) | Ans(Qo),^) = #(Foot(Q ), Ans(Q ), V) - H(Ans(Q ), V) 

But i7(Foot(Qo)> Ans(Qo)j'P) = #(Foot(<2o)> 'P)-, since the answers can be decoded from the foot- 
print and the published bits. Now note that H(Foot(Qo),V) < E[|Probes(Qo)|] • w + P, since this 
is the size in bits of the footprint and the published bits. Finally, note that H(Ans(Qo), V) > 
#(Ans(Q ))- Thus: 

tf(Foot(Q ) I Ans(Qo),^) < E[|Probes(Q )|] -w + P- F(Ans(Q )) 

Similarly, item 5. is bounded by: 

ff(Foot(Q*) | Q*,Ans(Q*),'P) < E[|Probes(Q*)|] • w + P + k ■ 0(e\g \) - H{Q\ Ans(Q*)) 
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Summing up all components, our encoding has expected size: 

n + 3P + k ■ 0(e lg i) + (Ans(Qo), Ans(Q*)) - H(Ans(Q )) - H(Q\ Ans(Q*)) (1) 

We can now rewrite: 

#(Ans(Q )) + # (Q*, Ans(Q*)) - H (Ans(Q ), Ans(Q*)) 
> ^(Ans(Qo) I Q*) + #(Ans(Q*) | Q*) + ff(Q*) - F(Ans(Q ), Ans(Q*), Q*) 
= #(Ans(Q ) I Q*) + #(Ans(Q*) \ Q*) + £T(Q*) - F(Ans(Q ), Ans(Q*) | Q*) - ff(Q*) 
= ^(Ans(Qo) I Q*) + #(Ans(Q*) [ Q*) - tf(Ans(Q ), Ans(Q*) [ Q*) 
= [H (Ans(Qo) i = Q) + H(Ans(Q) \ Q* = Q) - H(Ans(Q ),Ans(Q) \ Q* = Q)] 

We can now apply Lemma[2]for any fixed Q and the event 8 = {Q* = Q}. Note that the density 
Pr[£] is 2~ k '^( £lg ^ which constant probability over the choice of Q. Thus, the lemma applies for 
small enough e. We conclude that H(Ans(Q ) \ 6) + H(kns(Q) \ 6) - H(Ans(Q ), Ans(Q) | S) = 
£l(k) with constant probability over Q. Thus, the expectation is also Q(k). 

Plugging our result into (P), the size of the encoding becomes n + 3P + k ■ O(elg-) — Q,(k). 
Setting k = jP for a large constant 7, and e a small enough constant, the negative O(fc) term is 
double the positive terms. Thus, the encoding size is n — Q(k), a contradiction. 
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